-
Notifications
You must be signed in to change notification settings - Fork 421
Do an initial poll of the ChannelManager
write future immediately
#3977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do an initial poll of the ChannelManager
write future immediately
#3977
Conversation
In 7d856fe5b05d69b301fdb8a48352f3f3a16efca9 we moved to doing all of our persistence operations in the (async) background processor in parallel. This is great for slow persistence operations (eg with remote persistence). However, the `ChannelManager` persistence is very latency-sensitive (delays may lead to accidental force-closures today) and thus we don't really want to wait on network graph pruning or scorer time-stepping (which can be somewhat slow). Instead, we keep the new bulk-polling of the persistence operations but immediately do a single step of the `ChannelManager` persistence future after it is created. This should give it a chance to get going - initializing whatever network sends or connections need to happen - before we get busy doing CPU-intensive stuff. While it may not actually make the persist happen in full, it at least gives it a chance, and should make as much progress as possible until a network resources is exhausted (eg send buffer is full), at which point we need to wait on the network which is almost certainly slow than the 1ms or so it will take to time-step the scorer.
👋 Thanks for assigning @joostjager as a reviewer! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3977 +/- ##
==========================================
- Coverage 88.91% 88.89% -0.03%
==========================================
Files 174 174
Lines 124232 124240 +8
Branches 124232 124240 +8
==========================================
- Hits 110455 110437 -18
- Misses 11301 11320 +19
- Partials 2476 2483 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
// Because persisting the ChannelManager is important to avoid accidental | ||
// force-closures, go ahead and poll the future once before we do slightly more | ||
// CPU-intensive tasks in the form of NetworkGraph pruning or scorer time-stepping | ||
// below. This will get it moving but won't block us for too long if the underlying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is needed, shouldn't we stay on the safe side and await cm persistence separately? (not include it in the joiner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so - time-stepping the scorer takes something on the order of 1ms, so its going to be dwarfed by any network latency anyway. On the flip side, remote persistence may take something like 500ms if we have a bit high internet latency and there's a few RTTs. Thus, parallelizing the persistences across all the cases is really important for overall application latency, but at the same time we want to get things moving quick, so its worth polling once to get network stuff going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To some extent this is really because we're trying to optimize for two drastically different cases - async persist with a local DB that is 0.5ms away and async persist with a remote VSS store that is 500ms away. But I do think this is basically a free way to optimize for both - there's no real cost to polling once, aside from a bit of extra code, and it lets us get ChannelManager persisted fast but doesn't stall the whole thing on net.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, buying this.
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
Just gonna land this, its simple and straightforward. |
In 7d856fe5b05d69b301fdb8a48352f3f3a16efca9 we moved to doing all of our persistence operations in the (async) background processor in parallel. This is great for slow persistence operations (eg with remote persistence). However, the
ChannelManager
persistence is very latency-sensitive (delays may lead to accidental force-closures today) and thus we don't really want to wait on network graph pruning or scorer time-stepping (which can be somewhat slow).Instead, we keep the new bulk-polling of the persistence operations but immediately do a single step of the
ChannelManager
persistence future after it is created. This should give it a chance to get going - initializing whatever network sends or connections need to happen - before we get busy doing CPU-intensive stuff. While it may not actually make the persist happen in full, it at least gives it a chance, and should make as much progress as possible until a network resources is exhausted (eg send buffer is full), at which point we need to wait on the network which is almost certainly slow than the 1ms or so it will take to time-step the scorer.